Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlighting: dedup field names when using wildcard expression to define which fields to highlight #10916

Closed

Conversation

javanna
Copy link
Member

@javanna javanna commented May 1, 2015

When specifying a wildcard expression to define which fields to highlight, we retrieve the matching field names from the mappers. The returned list of fields may contain duplicates, which is why we should make sure we don't go over the same field more than once. This is just to avoid performing the same task multiple times, but doesn't change the result of highlighting.

@jpountz
Copy link
Contributor

jpountz commented May 1, 2015

Should the fix be in simpleMatchToFullName instead? Also maybe we should leave a comment explaining that we need to deduplicate fields today but this might not be necessary anymore if we store mappings at the index level?

@javanna
Copy link
Member Author

javanna commented May 1, 2015

Should the fix be in simpleMatchToFullName instead?

good question, I was wondering myself, to be honest I have no clue :) I do have the feeling that I might be hiding some other issue with this fix, not sure.

@javanna
Copy link
Member Author

javanna commented May 1, 2015

One other question could be if we should go ahead and try to highlight tons of metadata fields, like _routing, _ttl and such for instance when using *. I noticed this while looking at #10914 .

@jpountz
Copy link
Contributor

jpountz commented May 1, 2015

Let's ping @rjernst who might have ideas/opinions here as this is related to other refactorings he is performing.

@clintongormley
Copy link

@javanna re

One other question could be if we should go ahead and try to highlight tons of metadata fields, like _routing, _ttl and such for instance when using *.

We have an adoptme issue for that: #9881

@nik9000
Copy link
Member

nik9000 commented May 18, 2015

Should the fix be in simpleMatchToFullName instead? Also maybe we should leave a comment explaining that we need to deduplicate fields today but this might not be necessary anymore if we store mappings at the index level?

probably

@@ -76,12 +75,12 @@ public boolean hitExecutionNeeded(SearchContext context) {
public void hitExecute(SearchContext context, HitContext hitContext) {
Map<String, HighlightField> highlightFields = newHashMap();
for (SearchContextHighlight.Field field : context.highlight().fields()) {
List<String> fieldNamesToHighlight;
Set<String> fieldNamesToHighlight;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add

if (highlightFields.contains(fieldName)) { continue; }

around line 94?

That'll prevent two regexes that find the same field from highlighting it twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can the same field be highlighted twice if we use a set that doesn't allow for duplicates? not sure what I'm missing here. Anyway, agree that the problem is probably in mappings code, but I am not familiar with it, somebody else will have to jump in on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking something like:

"highlight": {
  "cat_*": <other stuff>,
  "*": <stuff>
}

As is now the last matching regex wins but it does so by highlighting the field twice. That's what I meant.

@javanna
Copy link
Member Author

javanna commented May 27, 2015

@rjernst any idea about whether simpleMatchToFullName should return duplicates or not? Should we fix the problem there (unless it was fixed already and I missed it) or in the highlighters code?

@rjernst
Copy link
Member

rjernst commented May 27, 2015

@javanna Yes, please make the fix in simpleMatchToFullName. It should be simpler in the future and not need a set (because only the full names will need to be checked).

javanna added a commit to javanna/elasticsearch that referenced this pull request May 27, 2015
… & `simpleMatchToIndexNames` in FieldMappersLookup

Relates to elastic#10916
@javanna
Copy link
Member Author

javanna commented May 27, 2015

Superseded by #11377

@javanna javanna closed this May 27, 2015
@kevinkluge kevinkluge removed the review label May 27, 2015
javanna added a commit to javanna/elasticsearch that referenced this pull request May 27, 2015
… & `simpleMatchToIndexNames` in FieldMappersLookup

Relates to elastic#10916
Closes elastic#11377
javanna added a commit to javanna/elasticsearch that referenced this pull request May 27, 2015
… & `simpleMatchToIndexNames` in FieldMappersLookup

Relates to elastic#10916
Closes elastic#11377
javanna added a commit to javanna/elasticsearch that referenced this pull request May 28, 2015
… & `simpleMatchToIndexNames` in FieldMappersLookup

Relates to elastic#10916
Closes elastic#11377
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Highlighting How a query matched a document
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants